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ABSTRACT 

Perspectives on the evaluation of college teaching 
a-e considered. Some of the conditions that are needed generally 
include the following: +he existence of an evaluation system that has 
a process for evaluating research and service, a clear definition of 
♦•he d^ffe-ence between the evaluation of worth ani the evaluation of 
merit^ and a separate support system to assist faculty in improvement 
efforts, -^he following are not recommended for evaluating teaching: 
classroom visits by colleagues, focusing only on measuring the amount 
learned, and checking on the relative number of further courses in 
+he s»me subject matter in which the instructor's former students 
enroll. It is suggested that the key component in the evaluation 
process is the student questionnaire, which should allow for an 
overall judgment of the merit of the instructor as an instructor. 
0*-her possible questions concern ethical or professional obligations 
of the instructor and components of instruction, such as ratings of 
♦•he text, quizzes, and the grading system. It is suggested that 
i^^structional materials, tests, student grading, and the student's 
role ■'n grading Id be examined in terns of comprehensiveness, 

correctness, and ency. Other ways that teaehiag can be evaluated 
include assessmen actual learning gains and faculty 
self-development and self -improvement efforts. The use of faculty 
evaluation in a faculty development process, integrating and 
disseminating the results of faculty evaluations, and the key 
ethical/legal constraints on data gathering are also addressed. 
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i, Introduction 

The normal procedures tor eNaluaiiii^z 
colle^zc leachiniz are so shoddy ai ihe imcl- 
Icciual and the practical IcncI, ihai ii is 
hardh surprising: ihai leachiniz is rarely re- 
warded in an appropriate way. Thai these 
procedures arc iniellcciually sloppv is a re- 
Dee I ion on both the scholars and tlic admin- 
isiraiors ol ihc academy, and ihai iheir me- 
thods ol application are also slipshod tells 
us soiiiethini! more about the true n alue s\ s- 
tcm of the academy which is not to its 
credit. Had colle^zes spent one percent of the 
lime on this problem that the\ ha\e spent 
on ////<T//^///r-funded research in the sciences 
or the humanities, we would now have 
somethiniz to be proud of instead of a 
source of shame. What follows has only the 
status of preliminar) notes about the prob- 
lems and their solution and easy ways to 
improve it greatly. 1 have tried to indicate 
the reasons why even these preliminary con- 
siderations clearly entail the rejection of es- 
sentially all existing systems. 

U. C ontextual Prerequisites 

It is not only pointless but improper to 
proceed with a system for evaluating teach- 
ing that is supposed to operate well and eth- 
ically regardless of the administrative and 
legal context. The following are some of the 
conditions that must be present in that 
context. 

A. The evaluation of teaching nriust be 
part of a system which also has an appro- 
priate process for evaluating research and 
service, and a specific commitment to their 
relative weighting. Otherwise one is playing 
in a game for which only a few of the rules 
have been stated. It is notable that, for ex- 
ample, the University of Texas* system and 
the University of California at Berkeley's 
system have no such commitment to the 
relative worth of teaching. The fact that 
UCB requires — and enforces the rcquirc- 
ment -that data on teaching, including stu- 
dent evaluations, must be included in a 



dossier before it will be ci-nsidered for per- 
sonnel decisions seems to show a valuing of 
leaching, it does not, I he data may show 
that the teacher is a bad teacher, yet the 
"data requirement**docs not result in a pen- 
alty; and if it shows /he is a good teacher, 
no determinate benefit results. Specific rela- 
tive weif^htings of all relevant criteria of fac- 
ulty assessment is the only non-vacuous 
procedure, and not incidentally the only 
equitable one. it is hardly surprising that in 
the two systems nieniioned (and most oth- 
ers) the standards \ary by campus and 
department. 

B. You must ha\e a system of adminis- 
trator evaluation in place in order to avoid 
the entirely justifiable resistance of the 
"serfs" to being evaluated by the "folk in the 
castle" who are abo\e such things them- 
selN'es. Administrator evaluation, as it is 
commonly done at universities and colleges, 
is a truly remarkable phenomenon in that it 
succeeds in being even worse than the evalu- 
ation of faculty, and it occurs in even fewer 
cases. 

C. The institution must have clearly un- 
derstood and defined the difference between 
the evaluation of worth and the evaluation 
of merit, with respect to the evaluation of 
teaching in particular. (The distinction also 
applies to the evaluation of research and 
service.) There arc a half dozen areas where 
the impact of this distinction is crucial, but 
I'll just mention two. Suppose that you have 
a teacher whose teaching is superlative by 
every reasonable standard, and who is also 
producing substantial quantities of abso- 
lutely first-rate research, while rendering 
impressive service to the profession, the 
campus and the community. If you are not, 
as an institution, absolutely clear that in 
spite of all this you may have to deny tenure 
to the teacher, you do not understand the 
difference between worth and merit. The 
decision w hether to make an initial ap- 
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poinimcni, ihc decision on granting icnurc, 
and the decision on pushing for early 
ireiircmeni by ihe various means available 
Ifor ihai. should all depend upon ihc worth 
\of a faculty member to the institution 
and noi merely upon ihc faculty membcr*s 
professional merit. The worih lo ihc insiiiu- 
iion is csseniially connected wiih such issues 
as ihc income generated by ihc siudeni en- 
rollmeni ihai ihe insirucior produces, ihe 
special services lo ihe insiiiuiion*s general 
mission ihat are provided by ihe insirucior 
(e.g. a uniquely lalenied Laiin scholar or 
teacher at a Jesuii insiiiution. a woman 
maihcmatician in an oiherwise all-male de- 
parimeni ihai is anxious iQ rccruii women 
as maihemaiics majors eic). Neiiher privaic 
nor public insiiiuiions can loday afford lo 
be awarding lenurc on mcrii alone as if ihe 
enrollmenis were irrclevani, bui many of 
them are locked into a sysiem of faculiy eval- 
uation which makes ii impossible lo deviaie 
from mcrii considerations even in extreme 
cases (except at the appointment decision). 
If worth is to be given any weighting at alL 
as it should be. then the exact size and limit 
of this weighting must be as carefully de- 
fined as poss^ible in order that one docs not 
get into obvious abuses such as firing fac- 
ulty members whose political activities 
lead to the loss of large donations by alum.s 
or legislatures, the sort of abuse which re- 
sults from thinking that "worth" means fi- 
nancial worth. lt*s not that worth doesn't 
have anything to do with financial worth (as 
in the enrollment example), it's just that it 
also has something to do with the integrity 
of the mission of the school and there is no 
worth left in an institution that abandons 
appropriate professional .standards in 
higher education in order to achieve finan- 
cial worth. 

D. There should be an independent sup- 
N poi4 system of some kind available to fac- 
ulty lO assist them in the effort to improve, 
so that the system of faculty evaluation is 
not merely punitive, and not seen as merely 
punitive (in which case it is unlikely to be 
viable). The system of independent support 
for improvement must be independent of 
the department chairs, the deans and the 
peers in order to avoid the disincentive to 
use it that would result if instructors know 
that administrators will know or can find 



out when it is being used by a particular in- 
structor. Hence the consultations must be 
absolutely confidential as well as profes- 
sionally based. Finding somebody who can 
provide the appropriate kind of assistance is 
very difficult because there aren't many who 
might be said to be qualified in the first 
place, and a large slice of that group is much 
100 inclined to think that there is one true 
solution to the problem of how best to 
teach. The cost of providing this support 
system is low. One professional plus one 
secretary/assistant per 10,000 students can 
do this and also a number of other jobs that 
should be done on any campus such as 
keeping up with the research and practice in 
the field of faculty evaluation and leaching 
research, maintaining a small consulting li- 
brary for faculty, improving the student 
questionnaire and managing its administra- 
tion and data synthesis phases. This in- 
volves research-reading competence; but the 
helping role that works with the prima 
donna type of faculty is not something thai 
prima donna types of teaching researchers 
are necessarily used to, so a nice balance of 
skills is required. It would be foolhardy to 
make appointments to this job for more 
than a trial period at first. 

E. The background system must be one 
which has consistent and appr opriate prac- 
tices, not just consistent and appropriate 
rhetoric. A typical example of hpw not to 
achieve consistent practice is to have a sys- 
tem in which the quality of teaching is said 
to be important (which involves, amongst 
other thing.s, the quality of the assessment 
of student work) while in fact departments 
jirc issued new positions and replacements 
for retiring or departing - old appoint- 
ments on the basis of enrollments. While 
this is Admirable in the sense that it refiects 
an attention to a reasonable consideration 
that a decade or two ago was almost totally 
disregarded, it is absurd m that in many 
contexts it amounts to rewarding depart- 
ments for - ar^^st other unattractive 
things - easy griS^, which is inconsistent 
with the rhetoric of respect for high stand- 
ards. Another e.\ample is the use of a stu- 
dent questionnaire which attends to the ex- 
tent to which instructors get to know the 
names and personalities of individual stu- 
dents, provide ample office hours etc.. all of 
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which arc results easily achieved fur small 
el asses but not lor large classes; and then to 
complain about declining enrollments, 
which this approach rcinlorces. Consistent 
systems arc hard to set U|''. but the inconsis- 
tencies in systems are extremely expensive. 

F, A reasonable modus \ i\endi with stu- 
dent government must be worked out lor 
mutual benefit from student e\ aluat ion. 
(See section'VI of this paper.) 

III. The Dennhion ofGood Teaching 

"I hc best teaching is not that which pro- 
duces the most learning. No definitions in 
the literature avoid obvious counter- 
examples. The following definition avoids 
the so-far-idcntificd counter-ejCrmples. but 
possibly has some of its own. The ideal 
teacher is one who has the maximal possible 
inHucnce towards beneficial learning on the 
part of his or her students, subject to three 
conditions; I) the teaching process used is 
ethical. 2) the curriculum coverage and pro- 
cess is consistent with what has been prom- 
ised, 3) the tea'^hing process and its foresee- 
able effects are consistent with appropriate 
institutional and professional goals and 
obligations. 

Perhaps we can best illustrate what is in- 
tended by the three qualifications with some 
examples of what they exclude. I) Unethical 
processes including not only cruel and un- 
usual punishment but also those processes 
which are completely non-gcncralizablc e.g. 
putting so much pressure on the students 
that they abandon their work for other 
classes in order to do the work for this one. 
It is often argued on weak j/?nc>n grounds 
that the use of a reward system such as a 
token economy or grades is unethical; in 
fact, failure to use such a *Veward sys- 
tem" is both unethical and unprofessional. 
It is unethical because it fails to inform the 
students about their progress or competence 
and/or it fails to show that the institution 
cares about quality work as opposed to min- 
imally competent work etc.; and it is unpro- 
fessional because it keeps important infor- 
mation about the student's progress from 
the student and the student*s advisors, par- 
ents and potential employers. It also 
happens to provide a legitimate incentive 
for many students, and not to use it is thus 
poor pedagogy unless you have direct evi- 



dence for a better approach. (Grading can 
even be arranged so as to make inter- 
student competiti\eness impossible, while 
retaining the feedback required in striving 
for excellence.) So, even if grading reduced 
learning, it would have to ive done and done 
properly. But properly managed it 
should increase learning. 

2) Certain contracts arc made by instruc- 
tors and institutions, both explicitly and 
implicitly, to which they frequently do not 
adhere. ,Such contracts are invoUcd in 
promises as to what a course will cover, 
made either in a catalogue, the departmen- 
tal handouts, the faculty handbook, the un- 
ion contract, the class handouts or in the 
language of the opening presentation in 
class. Apart from the simple sin of mislead- 
ing advertising, much of the hierarchical 
structure of sequential curricula has been 
made laughable by the failure of instructors 
to adhere to these standards. Maximizing 
learning cannot be j»iven precedence over 
these obligations. 

3) It would usually be appropriate, if the 
maximization of learning were the only ob- 
ligation of an instructor, to adjust the level 
of instruction to the class average so that 
more people would be able to benefit from 
it. But there are institutional and profes- 
sional commitments that transcend this 
commitment.' For example, if one is faced 
with a class of medical students who will 
graduate at the end of this year, and if one's 
obligation is to instruct them in professional 
procedures that they will be practicing upon 
an unsuspecting public a year from now, 
and if it is clear that most of them are so far 
off the pace that no instruction within the 
time available can get those students up .to 
a level of competence, then it is profession- 
ally obligatory to focus the instruction upon 
those few who can be brought up to the ap- 
propriate standard and flunk the rest if they 
donl make it. Other obligations that are 
important and must be considered very 
seriously here include the problems of pro- 
viding compensatory justice for minorities 
and women (which may mean focusing 
more effort there than shows up in maxim- 
izi'ng classroom learning), *nd providing 
knowledge that is going to be expected by 
the instructor of a higher level course etc. 
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Some assoried notes thai bear on llie 
preceding poinis inelude the followmg. 
Firsi. the 'Mcurning" relerred lo in the deli- 
nition can certainly cover aiiiiudcs, in the 
rare cases where it is possible to justify the 
claim thai teaching a certain attitude is an 
appropriate part of an instructor's obliga- 
tions, Teaching the scientific attitude might 
be a case in point, as might be teaching the 
motivation to learn. It mast also be remem- 
bered that teaching can often be done and 
often is done to a very important extent 
outside the classroom; hence evaluating it 
must involve looking for and at this out -of- 
classroom teaching, 

Perhaps the most important feature of 
this definition of teaching is that it does not 
identify good teaching with the production 
of good learning, Corviderations such as 
those mentioned preempt such a naivt* 
equation (one that 1 rather faneic*i in-my 
earlier thinking about this subject): but 
there is yet another type of example that 
should be mentioned. If you have a number 
of students in the class who, for various rea- 
sons not connected with One s own perfor- 
mance, are not working hard enough to keep 
up with the class, one should not be down- 
graded as a teacher for the failure of these 
students to learn, A teacher's task is only to 
provide the best possible environment, not 
10 guarantee that the results will be effective 
no matter how liule effort is made by the 
students. It follows that one must never use 
on a student questionnaire the question: 
**How much did you learn from this class 
(that you think was valuable to youj?" or 
cognates of that question. One must instead 
use questions such as **How well do you 
think the instructor taught the course?" It is 
partly for this reason that the answers to 
student questionnaires are not to be re- 
garded as intrinsically inferior to studies of 
the learning gains of students in class. They 
are more valid in one respect, in that they 
address the correct question, 

A second incorrect definition of good 
teaching which has some supporters in- 
volves the identification of teaching with the 
transfer of learning from the teacher to the 
student. But of eourse—as has often been 
pointed out— inspiring the student to seek 
learning elsewhere may be the best ap- 
proach to maximizing learning even in the 



shon run and more likely in the long run. 

IV. How Not to Kvnlimte Teaching . 

A. It will be clear from the preceding dis- 
cussion that any way of evaluating teaching 
that simf)l\' consists in measuring the 
amount learned is oversimplified and can be 
extremely unfair to some teachers. It's also 
clear that evaluating a teacher's worth only 
or merit only will on occasions be cntirelv 
inappropriate. Let us turn quick!) t 
number of other errors that have f> 
widely incorporated into qua si -systems tor 
evaluating teachers. 

B. Classroom Visits. Using classroom 
visits by colleagues (or administrators or 
"experts") to evaluate teaching is not just 
incorrect, it is a disgrace. Some of the rea- 
sons for this conclusion follow. First, the 
visit itself alters the teaching, so that the vis- 
itor is not looking at a random sample. Sec- 
ond, the number of visits is too small to be 
an accurate sample from which to general- 
ize, even if it were a random sample. Third, 
the visitor is not devoid of independent per- 
sonal prejudices in favor of or against the 
teacher, arising from the fact that the visitor 
is normally an administrator or colleague of 
the teacher and in his/her other role is in- 
volved in adversary proceedings, alliances, 
etc, with the teachers. Fourth, even if 
none of the preceding. conditions make the 
whole affair ridiculous, which each of them 
doss independently, there is nothing that 
could be observed in the classroom (apart 
from the most bi/arre special cases) which 
can possibly be used as a basis tor an inler- 
ence about the merit of the teaching. That 
this is so follows inexorably from the resuhs 
of the enormous number of studies on style 
research that have been done and sumrtiar- . 
t/ed on various occasions, the sum total ol 
these resulting in the conclusion that there 
arc no style indfcators that can be said to 
reliably correlate with learning now or in 
the future by the students in the classroom. 
Filth, regardless of the fact that no obser\a- 
tions of teaching style can legitimately be 
used as a basis for inference to the merit ol 
the teaching, the \'isitor normally belie\es 
the contrary. This is often because the vis- 
itor has his or her o\;n preferences as to a 
certain style, or has many years of e\pe- 
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ricnv'c »n usiLhing ihis same iNpc ol vduiac, 
ami CDnscqucnlU bciicNCs ihai iiol iIdiii^ il 
I his way or in one or I wo oihcr \va\s tliai 
arc approNCtj <loin^i it hadis. I h'cscs 
prciudiccs arc tituiKv without UuindatMni. 
to the best ol our knowlctlgc. ami slumlil 
not he alU)ucit to come uito the e\aiuaiu)n 
ot tcaehing at all. (AnuMig the rare c\eep- 
lions arc the possihilitv thai what the 
teacher is saving is knowi^ to he false h\ the 
visitor, arui so grossly lalse as to constitute 
an impLissihU .chicle lor teaching the truth; 
that the visitor observes racist or ncmsi or 
other iniiuoral practiccN bv the teacher; or 
t.uit the visitor observes a total lack of class- 
room discipline to a degree vvhich cannot 
possibly be reconciled with the continuance 
of a learning process ol any kind. Since 
none ol these events has ever been reCoriled 
on any ol the classroom visiting sheets ol 
the many thousands that I have either in- 
spected directly or ol which I have seen 
summaries, this cannot be taken seriously as 
a reason lor making classroom v isits part ot 
teacher evaluation.) 

Ultimately, the problem about the visitor 
is the lack of similarity between the v isitor s 
head and the student's head. I hey are gen- 
erally separated by several decades in their 
learning maturity, thus they may have sub- 
stantially different vocabularies and con- 
ceptual repertoires, and ihcy certainly lack 
much cultural similarity: because of this, the 
visitor is not a good indicator of how much 
learning is going on in the heads of the stu- 
dents. Since the secondary indicators of 
teaching style (i.e. everything besides em- 
pathy) turn out to be invalid, this leav es the 
visitor— or shotiUi laxvc the visitor up the 
proverbial gum tree. I need hardly stress the 
fact that in spile of these staggering objec- 
tions, the method of classroom visitation is 
the universal method whereby teachers in 
the elementary and high school are evalu- 
ated, and is — depressingly enough—being 
quite steadily implemented at the post- 
secondary level, on the grounds that il rep- 
resents an improvement. No variation of 
this is of any value whatsoever: visits by ex- 
perts are no better, visits by peers are no 
better than visits by administrators, w ifh re- 
spec r to the personnel evaluation process. 
There is a very limited role in which visits by 
a consuhant are defensible in the effort to 



piovule lielp to improve teaching // // m (jA 
icihh known thdi tlw U'inhini; i\ \ci\>nn- 
N<«(css/i// I his IS Min^lv because ihc con- 
sultant niav be able to suggest sonic 
ofuioiis not (>/>///M^// alternativcK. hut iiisi 
futwihir ones. Hut it s \isuall> i^nneeessarv 
to have the (costlv ) classroi>iii visit a tape 
recording or even a verbal report bv the 
teacher plus the student evaluation tornis is 
more than enough basis lor consulting rec- 
ommendations in most c'ascs. 

C leaching is usuallv evaluated without 
anv scrums attempt Jt evaluating the 
ijuaiitv ot the content ot the course. I his 
IS the. "methods-madness" that has made 
schuols of education a laughingstock in the 
intellectual community and increasingly 
in the total coninuinitv. lor man> vears. 
I roni what vvc have said about the detini- 
tion ot good teaching, it is clear that one 
cannot make one's judgment of teaching 
merit cntirelv on the basis ol the content ol 
what is learned by the students, but to do so 
is much better than making the judgment 
on the basis ol merely inspecting what is 
presented to students. Both should be con- 
.idered. but with an emphasis upon student 
perforqiance. Here is the one place where 
peer evaluation of a limited kind is 
appropriate but evaluation of materials 
not process and even here it is better to 
use people from another institution (but in 
the same subject-matter area), eliminating 
costs by trading services in kind with thai 
institution. Such an arrangement with its 
social pressure- tends to do much more to 
improve standards than in-house materials 
evaluation does anyway, apart fr,om in- 
creased validity. 

D. Another popularly acclaimed though 
infrequently employed measure of leaching 
success is checking on the relative number 
of further courses in the same subject matter 
in which '^graduates" of the instructor being 
evaluated enroll. This is an unethical indica- 
tor and its use cannot be countenanced by 
central administration because il is non- 
generali/able. Thai is. one can only score 
points on this dimension by stealing, buying 
or seducing students from other depart- 
ments. It s exactly like the process of getting 
more work out of the sfudents by having 
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Ihem giNC up on then homework Un other 
courses. ll\ a great iiulicator It you treat I he 
welfsue oi the discipline as the swmnum 
honum, rather than the welfare of the msii- 
tution or ol' higher education or ol Ihe hu- 
manities or of the society; or of the student 
(with a minor qualification). 

I:. Most student questionnaires are im- 
proper bases for faculty evaluation, because 
they generally involve ratings on^^tylc or 
ratings on non-gcncralizable indexes etc 
But the way in which the data from them is 
synthesized is also a problem. If you only 
use mean ratings, then you overlook the 
important case of the instructors who are 
tremendously successful with a sub-group 
of the class. Although their mean score may 
be no different from that of another instruc- 
tor who receives weak ratings from every- 
body in the class, their teaching potential is 
obviously different and a sensitive adminis- 
tration should be making some provisions 
to tap such a promising source of inspira- 
tion in a more appropriate situation than 
the particular class that resulted in-a low 
overaK score. Or the consultant may be able 
to work out a way to generalize from this 
promising basis to the other students. Sim- 
ilarly, averaging the means from courses 
taught at different levels may be unfair to 
good teachers of introductory (or graduate) 
courses. 

There are a number of other ways not to 
do the job that are widely used, but let's 
turn our attention to at least a preliminary 
sketch of how the job might be done 
properly. 

V. How Teaching Should Be Evaluated 

A. The key component in the evaluation 
process is the student questionnaire, but the 
piece of paper itself is only part of the story. 
The support system has got to provide meth- 
ods for administering this that are bullet- 
proof against complaints about the possibil- 
ity of selective return and prompting. A 
good straightforward approach is to have 
the central administration staff (a cheaper 
fallback system is using the departmental 
secretaries) take the questionnaires out to 
each class, have the professor leave the 
room for a few minutes, provide a brief ex- 
planation of the process and how the results 
. are to be used (possibly in writing), encour- 
age questions, and pass out questionnaires 



to be filled out by every niernber of the 
class. One should get a 99 percent return 
rate from those present. Where knowing 
that the questionnaire will be distributed on 
that d^ate is an incentive, the date for 
distribution should be set in advance if one 
wants most of those enrolled in the class to 
attend. (This docs weaken one's defenses 
against certain types of preparation of the 
students by the professor, but this can be 
controlled by asking about it on the form so 
it is a less worrying problem than low atten- 
dance rates.) "Absentee ballots" are not a 
good idea, logistically or for inferential 
purposes. One can get that kind of response 
rate, and tolerance from the faculty for 
taking time out of their classes, if the whole 
process take? less than about 5 minutes. 
This is the first of two compelling reasons 
for using a ve^y short questionnaire. 

The second is that there is not the slight- 
est justification for using a long one. The 
usual questionnaires with twenty to sevent/ 
items on them are fishing expeditions of an 
entirely improper kind. Unless you can say 
of a particular question that it is ilvman- 
strablv that an answer of a particular kind 
imlicaws merit or demerit in teavhini^. then 
it has no place on the personnel evaluation 
questionnaire. Other and more detailed 
questions may be used in a questionnaire 
designed by the instructor with the assist- 
ance of the consultant (or perhaps the other 
way around), when an instructor want.s to 
get specific feedback on some specific effort 
'or style venture in which he or she is inter- 
ested. But no such commitments can be legit- 
imated for general evaluative purposes and 
no such questions should ever be on a form 
which is seen by anybody except the instruc- 
tors and consultants of their choice. If 
placed on a form along with legitimate 
questions they will be likely to bias the re- 
sponse of somebody who favors or disfa- 
vors the style that they uncover, and such 
biases are illegitimate. (They would be cer- 
tain to seriously affect the legitimacy of any 
personnel decisions that were appealed in 
court, for example; but the ethical point is 
more serious than that.) 

Since one cannot help but be worried by 
the contamination of instructor ratings by 
irrelevant perM)naIhA' foe t()rs ixnd by factors 
.connected with li/fv vr dislike of the course 

■ \\ 
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iiself» one should make some en'ori to si- 
phon off ihosc oihcr consideraiions, A sim- 
ple way is to stress ih©iConirasi between 
these considerations in the introduction to 
the basic judgmental question on the ques- 
tionnaire, i.e. between liking atid disliking 
the instructor as a person and thinking well 
of him as an instructor; and between liking 
and disliking the course content (of a re- 
quired course) and thinking that the instruc- 
tor did or did not do a good job of teaching 
it. Another approach to siphoning would 
consist in asking specifically for I) evalu- 
ation of the instruclor as a person, and 2) 
of the course itself. ^ ; J- then asking for 3) an 
evaluation of the job done by the instructor 
in teaching this course. The course evalua- 
tion couid then be torn off the form :md 
sent to the chair and/or dean for course 
evaluation; the personal evaluation could He 
torn off and thrown away or given lo «f;. 
instructor if she or he requested it; and one 
would then use the rest of the form for per- 
sonnel evaluation. 

The crucial question in the rest of the 
form is a request for the overall judgment of 
the merit of the instructor as an instructor. 
There tirt*sorne other possible questions on 
such a form, of a kind not often encoun- 
tered. They should come before the requests 
for an overall rating since they are likely to 
depress it. (and increase its validity) which is 
firie because the main problem with the usual 
results is that the scores are too high to 
allow adequate headroom in which excep- 
tionally good teaching can distinguish itself. 
These other (possible) questions are of two 
kinds. The first concerns matters that can be 
described as ethical or professional obliga- 
tions of the instructor and can be phrased in 
either a positive or a negatiNC way. or - 
better alternately. They may concern such 
matters as the match between the pre- 
announced content of the course and the ac- 
tual content: the match between the content 
of the course and the content ofthe exami- 
nations: the extent to. which the reasons for 
grades were explicitly and adequately justi- 
fied: the extent to which Ihe possibility of 
appealing a grade was explained: Ihc fre- 
quency with which scheduled class and of- 
fice hours were met: the use ol racist or sex- 
ist or other bigoted remarkN or materials 
and so on. I his is the so-called **BIack 
Mark.s list "(when phrased negatiNely) and it 



is surprising the extent to which it reminds 
instructors as well as students of the min- 
imum professional obligations of the in- 
structor, obligations that can rather easily 
be discharged and which one constantly 
finds are an underlying cause of dissatisfac- 
tion and bad holistic ratings. 

The second kind of legitimate question 
consists in a listing of the components of in- 
struction lor independent **micro- 
assessment" i.e. **please rate the catalog des- 
cription text qui/./es grading system class 
handouts sections or labs or fieldwork 
etc.'\ 1 now feel that a form with these 
component-evaluation questions jjl^s t he 
macro-question, plus an optional request 
for suggested improvements in the form and 
process of administering (using it) is the best 
basic form, though one can reduce the effort 
by using only the macro-evaluation, going 
to component analysis only when the in- 
structor requests it or is seriously and regu- 
larly below the mean for comparable 
courses. At that point, where remedial ac- 
tion is necessarj, the components evaluation 
is an obvious help. When a'crucial person- 
nel decision has to be made. I think that the 
"professionality" questionnaire provides a 
useful and valid supplement to information 
from the other two. Such data should be ac- 
companied by comments on it from the 
instructor. 

B. There must be careful examination of 
the quality and professionality of content 
and process: the three qualities here are cur- 
rency, correctness, and comprehensi.\eness. 
Ratings must be made on the basis of a 
sample of I) the materials provided, 2) the 
texts required and recommended, JJthe ex- 
ams, 4) the term paper topics. 5) the student 
performances on the preceding, 6) the in;:^ 
structor's performance in grading student 
work, 7) the instructor's performance in jus- 
tifying the grade and providing other help- 
ful feedback^ 

C. Actual, learning gains. These arc only 
going .to be useful if one can in advance 
specify and justify the comparisons, that will 
be used. By themselves, they have no legiti- 
mate interpretation except, perhaps, where 
they are negative. In cases of multiply 
taught sections for an instructional course, 
the comparison between Sections can be ex- 
tremely useful. In cases where there are 
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national norms ihcy can somciimes be use- 
ful. And wiih respcci lo insiruciional im- 
prcvcmeni. one can run a comparison 
againsi one's own^previous performance 
and I over wheiher slight or large changes 
in a[ ach or lexi.iurn oui lo havesignifi- 
cani results. 

The main aim of the comparisons in- 
volved is to try lo determine what could rea- 
sonably have been taught to the students: 
it's always fairly straightforward to find out 
what was actually taught, but that is evalua- 
tively useless without some sense of what 
was possible. If allocation of students be- 
tween the afternoon sections of a large intro 
class is random, then one can indeed dis- 
cover something quite important about this, 
if one is prepared to look for patterns that 
extend across a couple of years. Shorter pe- 
riods of study are liicely to yield results due 
to unrecognized idiosyncrncies of the par- 
ticular classroom, time, groups present etc. 

D. Professional Development Dossier 
(Self-evaluation and Self-improvement in- 
formation.) Although the results of self- 
improvement should show up eventually in 
improved performance on the above scales, 
it's worth including it directly since doing so 
encourages faculty efforts in this direction, 
hence speed up the improvement; and it also 
improves the validity of decisions that have 
to be made under a time constraint. Under ^ 
this heading we expect the submission of a 
rationale for each course's methods and 
coverage (where the latter is the instructor's 
option), evidence about professional devel- 
opment activity such as readings, courses, 
workshops, consulting aimed to improve 
leaching, and most importantly, a descrip- 
tion of planned and performed experiments 
on the individual instructor's own teaching 
approach. A basic list of courses taught, en- 
vironments, and grades; and of committees 
(instructional, departmental, school and 
'college) should be included here preferably 
as hard-copy' output from the institutional 
data base. 

The preceding list of four dimen- tnn^ 
covers the four crucial iypes of data thm 
should be gathered. There are ways in which 
these can be ramified usefully and without 
significant extra cost. For example, feed- 
back from ihe students should be supple- 
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mcnted by feedback from' teaching assis- 
tants or teaching aides where tliese are 
:;mployed, and where confidentiality of the 
responses can be preserved (as is usually the 
case). 1 have sometimes previously included 
a separate ethical component or a "justice 
dimension" (as 1 then called it), but 1 have 
here amalgamated it into "professional con- 
tent and process." 

For faculty self-development purposes, 
the questionnaire can of cour-jc be ex- 
panded to include an open-end question in 
which there is a call for identifying particu- 
larly good and particularly bad features of 
the instruction, but 1 usually keep these re- 
sponses down in volume (because it's so 
hard to simplify large numbers of these re- 
sponses) by requesting that they be pro- 
vided only when cither the top. or the bot- 
tom two scoring categories is used. (i.e. only 
when an A or a D or an F rating of the in- 
structor is given by the student). This also 
improves their utility, because it avoids the 
probability of interpreting e.g. favorable rc- 
' sponscs with long lists of criticisms 
attached. 

In addition to the ratings of the instructor 
done by students during the clas,s, there is 
another type of student rating that is excep- 
tionally valuable, at least when done spo- 
radically, and that is the "exit" interview 
done at graduation. It falls midway between 
the in-class rating and the alum rating: and 
is vastly better than the alum rating because 
the rate of return can be around 100 percent 
instead of around 30 percent (rendering the 
latter essentially useless for personnel deci- 
sions), memories are more reliable at that 
point, it refers to a more recent version of 
the courses, and causal inferences arc more 
likely to be valid at that point. Exit inter- 
views^have a quite disproportionately large 
proportion of the students, but who do not 
show up in any other search. 

Notice that the approach outlined here 
does a great deal to protect faculty fron"* ' vo 
sources of injustice that are partici riy 
pervasive in current systems. The first is the 
kind of injustice that makes it impossible 
for the teacher of a generally unpopular but 
absolutely essential prerequisite course e.g. 
calculus'for architect students, to score 
really well compared to the norms against 



14 



The Evaluation of College Teaching 



which she or he will usually be judged. Wc 
make allowances in ihe above sysiem lor ai 
least ihree ways in which such an insirucior 
will be able lo emerge as firsi-raie. The 
other kind of injustice relates to the individ- 
uals who put an enormous amount of work 
into developing a course, keeping it up to 
date, and making it as effective a teaching 
device as they possibly can. ai (he expense 
of \ois of happy friendly humanistic inter- 
changes and socializing. When an evalua- 
tion system is issued which does not pay a 
great deal of attention to the content of the 
course, and which throws in irrelevant re- 
quest.s to have students rate the course on 
"touchy-feely*' dimensions, these efforts are 
largely unrewarded. 

Although peers -one kind of expert 
must be used tor evaluation content, they 
are usually not competent to handle all that 
falls under B. although they should be. For. 
there is one kind of usually hidden behavior 
that should be picked up by an evaluation 
system, and which does require a different 
kind of expertise to assess, and this is the 
technical competence of the instructor in 
pedagogy e.g. in writing reasonable exami- 
nation questions that are unambiguous, in- 
volve adequate coverage of the intended 
target area, are not oxer cued, etc. This kind 
of quality is rather easil\ picked up by 
somebody with good skills in test construc- 
tion, and such a person should routinely re- 
view samples of the tests of those instructors 
that are up for (or a year from) particularly 
important personnel decisions like tenure. 

Another point-at which such experts have 
some relevance c6ncerns the way in which 
exams are graded or marked. There are pro- 
fessionally required standards here, with 
which virtually no faculty members at uni- 
\ vers i ties have the slightest familiarity. As a 
'remedy for this, administrators should 
request that as a normal part of the pro- 
cess of talking about self improvement, 
the instructor fill out a form indicating how 
papers are in fact graded. The kind of grad- 
ing that is legitimate involves at least the 
following requirements; the papers are 
graded "blind" e.g. by having the students 
write their names inside the hack cover of a 
blue book or in any of a do/en other more 
foolproof ways; the papers are graded ques- 



tion by question, not paper by paper (this is 
to avoid the known large ha la effect that re- 
sults from having read a good, or bad first 
question by the same author just before 
reading the second question). When one has 
graded all the nth questions in a given set of 
papers one must then go back and grade the 
first six to ten papers again in order to see 
whether one's grading standards have 
slipped, upwards or downwards (as they 
often do); the pile of papers should be 
shuffled after each question is graded so 
that different students stand the brunt of 
one's fatigue (or initial optimism) for differ- 
ent questions; and sc on. With multiple 
choice exams,, m ny of the preceding re- 
quirements are otiose; but the requirement 
for technical proficiency in posing them be- 
comes enormously more important and is 
something that is beyond the competence of 
nearly all academic instructors at t' e mo- 
ment. (It's illuminating to talk to people at 
the Educational Testing Service about the , 
difficulty of finding individuals who can 
write good multiple choice items. I think one 
can tie in these practices with those required 
by the "Black' Marks List" and call them the 
Professionality Dimension (E). But they can 
also be left buried in A and B. 

Much of the preceding refers to summa- 
ti\e evaluation for personnel decisions. For 
that purpose, it is quite important to give 
out the class questionnaires at about the 
same time in the term for all instructors. 
What is the best time? Faculty at USF argue 
that the tenth through the twelfth weeks of 
class (semester system) would be about 
right; not too near to the beginning to give a 
reading on the basis of inadequate expe- 
rience, not too near to the end so as to inter- 
fere with the intensive review period for the 
examinations and the occasional drop in at- 
tendance while students stay at home. Get- 
ting alt the classes visited within this period, 
is logisticatty feasible for a relatively small 
campus, but would involve substantial diffi- 
culties in a large one. But validity considera- 
tions suggest that administration of question- 
naires after Hnal grades have been seen is 
preferable and this also increases headroom 
on the distribution. See section VI for more 
on this approach. 

One could turn to the use of senior de- 
partmental secretaries to distribute and col- 
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Icci quesiionnaircs. bui ihcre is siill a 
credibility problem aboui ihai. li is essen- 
tial not to use graduate students or stu- 
dents from the same campus, for similar rea- 
sons. The system of having each class elect 
a student who will do the job and take the 
materials over to the central administrative 
offices has worked quite well at some insti- 
tutions, Bui it does not provide a person 
who can answer questions and exhibit in- 
side knowlcii fx about the importance ol the 
process and its results. Under no circum- 
stances can one go to mail ballots or '*drop 
it in the nearest collection box" ballots, be- 
cause the response rate deteriorates se- 
riously» and it's hard to justify validity with 
under 85% return rates especially if varia- 
ble. One must select the best procedure in 
terms of the accessible resources at a partic- 
ular site. 

Similarly, although it is rather easy to use 
optically scannable cards, this gets one into 
ihe business of providing the appropriate 
marking pencils, which is to be avoided at 
all costs. It's*better to use straightforward 
forms and have them keypunched; it turns 
out that this approach —rather surpris- 
ingly—is less expensiv :. The computer pro- 
gram for combining the data on the type of 
questions that we have been talking about is 
of course.extremely simple and can provide 
a variety of interesting readouts. After in- 
vestigating variations in small and large 
classes, required and optional classes, upper 
and lower division classes and differences 
between fields, one usually fmds that most 
of these differences are not sufficiently im- 
portant to be worth reporting separately, al- 
though one can run them through the ma- 
chine for a check every time the system is 
operating. One should also have the compu- 
ter automatically Hag performances that arc , 
either '4 standard deviation above or one 
and two standard deviations below the av- 
.erages, not because a standard deviation 
means something in terms of tradPtional 
significance with the kind of skewed distri- 
bution you get here, but becau.se it's a con- 
venient and quite appropriate flag. If one 
does not use a set of preliminary que.stions 
on components or professionality, one will 
probably have to set the upwards flag at 
half or two-thirds of a standard deviation in 
order ;o get any success . Hes at all. 
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How otteif should this process be run? In 
my N iew, every second course or every third 
course gi\en, stratified by upper and lower 
division and graduate categories, is a good 
compromise. Moreover, it cuts the person- 
nel and time requirements lor distributing 
and collecting the forms b\ a hall to two- 
thirds, besides giving faculty a chance to do 
something about an unsatisfactory readout 
before the ne.xt evaluation is upon them. 

There are certain by-products of ihis ac- 
tivity that are of somq, interest to ad- 
ministrators. For example, one can get a 
very good readout on the actual teaching 
load of the faculty, in terms of numbers of 
classes and in terms of numbers of live bod- 
ies; it turns out thai there are sometimes 
a startling number o|"phantom" classes i.e. 
classes that are not in fact meeting although 
they haven't been cancelled, usually because 
there weren't enough people to ju.^ify a 
scheduled class meeting and the instructor 
didn't want to convey to the chair or dean 
ihe fact that her or his load had partly 
evaporated, A comparison of the number of 
grades awarded with the number present at 
the time of the administrations question- 
naire will give an attendance rate ratio 
which is also something that bears watching 
for faciu;^^ members who ace trying to im- 
prove their performance. The number pres- 
ent for the evaluation should not be printed 
out on the summary sheets without a refer- 
ence to the number absent, if the registrar's 
computer can be brought to produce a cur- 
rent figure on that. Printing out some index 
of bi-modal distributions is alsq easy, but 
should in general be reserved for the situa- 
tion where someone is looking for help. 

VI, How to Use Faculty Evaluation in a 

Faculty Development Process 

Most of the preceding refers fairJy specif- 
ically to summative evaluation: it is time for 
us to say something more specific about 
formative evaluation. In general, there is no 
need at all for this to involve the kind of 
rigorous supervision of questionnaire ad- 
ministration we've been talking about to 
date. Nor need it be done at any particular 
scheduled time. In my classes I have fre- 
quently used the following eccentric proce- 
dure. Evaluation forms are distributed to 
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everybody that comes into the class the 
moment they walk in the door on the first or 
second or third day while the tourists arc 
still shopping around. The forms request 
the student to turn them in by putting thcrn* 
on the front desk, or by giving them to 
another student to turn in. if they decide not 
to continue with the course after even one 
or two sessions. The reason for this is that 
one wants to know whether there was some- 
thing misleading about the advertising, 
needs-irrelevant about the content or off 
putting about the early presentations, and 
this is the best possible way to pick is up. 
Pcrhaps there was something about the way 
that one was outlining the proposed exami- 
nation and assignment process that seemed 
particularly formidable. One won't pick 
these objections up from people that left be- 
cause of them and hence aren't there toward 
the end when the usual forms arc distrib- 
uted. Indeed, as you reflect upon the usual 
process, you see that it is heavily biased in 
favor of the instructor because of the highly 
self-selected population that stays around. 
One is losing much of the critical and nega- 
tive feedback, and I personally find that 1 
learn a good deal from it. though I won't 
pretend that 1 like reading all of it. 

With respect fo students who stay around, 
on'^ next requevts that all of them should feel 
free to submit anonymous suggestions and 
criticism at any time throughout the term. 
One needs to make very careful and specific 
arrangements about this, to preserve their 
anonymity and to encourage them to see 
that one is going to do something with the 
results: e.g. issue feedback forms and pass a 
collection box around once a week. After 
the midsemester. one requests that a full 
component analysis form be submitted with 
perhaps a half dozen questions on it related 
to how fair the exams and grades were, the 
coverage to date, the treatment of students 
raising questions in class etc. Now one has 
the chance to show that this input really is 
useful, by discussing it and taking whenever 
steps seem best to improve the class in the 
light of it. This is also a very good time to 
demonstrate that some of the criticisms are 
mutually contradictory and hence that it 
isn't easy to satisfy everybody. Next one 
applies to the faculty Senate or the approp- 



riate administrator for permission to give an 
early final. The final is given during the last 
class (or two classes) and is corrected in 
time to be returned at the time originally 
scheduled for the official final in finals 
week. Attendance at the session is required, 
as it would be for a final, upon penalty of 
receiving an incomplete in the course, and 
possibly a grade penalty. At that last ses- 
sion, the exam questions are gone over in 
front of the class, an answer key with 
"model ans>vers** is handed out. a sheet of 
actual and unsatisfactory answers that illus- 
trate certain types of mistakes arc handed out 
with the appropriate commentary on it. an 
opportunity is provided to raise objections 
to the grade with the instructor and perhaps 
with teaching assistants, and then the stu- 
dents are required to hand in a final evalua- 
tion form on the class. The students who 
had been optimistically supposing that they 
were going to get an A and« in the light of 
this, had given the instructor quite a good 
grade in the tenth, eleventh or twelfth week, 
are now suddenly confronted with reality 
and their evaluation of the instructor is 
often affected. Too bad —but one cannot 
expect them to provide as good a judgment 
of the course prior to receiving a grade for it 
as subsequently. Moreover this arrange- 
ment -apart from improving the evidential 
basis of the student evaluation and hence 
probably the validity of their ratings- 
makes the exams part of the learning pro- 
cess, not an arrow shot into the air and fal- 
ling to earth one knows not where except 
for a grade. Improvement and effort should 
be directed by the most accurate evaluation 
possible, whether or not the institution uses 
the best system. That*s the first point about 
formative evaluation (evaluation for 
development.) 

In the course of formative evaluation, it is 
entirely ap'propriate to ask questions of the 
students about style, w here one is striving to 
achieve a particular style. But it is even 
more appropriate (because of known valid- 
ity) to ask some microquestions to identify 
problem areas such as the text, handouts, 
quizzes, mid-term assignments, grading 
process, treatment of questioners, availabil- 
ity in office and after class, and final exams. 
There are no style assumptions lying behind 
the belief that it is better to perform well in 
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each of these areas. Whereas it is certainly 
possible to have all instructors ask every- 
body about these matters, i.e. build them 
into the general form, the consequent in- 
crease in the load on the computer is not 
only cosily but less effective because Ihe 
rigid time for collection of the data does not 
provide one the opportunity to see whether 
improvements have been eflectuated that 
seem to work well with the claps. A one-shot 
feedback is never as useful formaiively as a 
sequence of feedbacks. 

if an instructor is getting bad ratings 
over \1! that do" not seem to be explicable in 
terms • f. for example, the peculiarity that 
the course is required for non-majors, UiZVi 
*she or he should certainly go to the k "'td of 
components analysis form that I have just 
mentioned. Professional consulting at this 
point will often suggest a number of ways to 
improve performance. Only if all of that 
fails should one consider going to an analy- 
sis ot style and a discussion o! a»:ernativc 
possible approaches to ihai. because ihe re- 
sults of style research make this a last resort 
and there are still other non-style reasons to 
worry about first, such as the giving of in- 
discriminate A's. which may better ser\'e to 
precipitate a request or requirement that the 
instructor alter a particular teaching process 
because it's unprofessional. 

I think that most people who have the 
requisite content knowledge are capable of 
becoming rather good instructors; but I 
don't think that many of them do so. and I 
don't think that all of the faculty—even 
those with tenure— either could or will be- 
come or remain reasonable insti uctors. 
Hence the evaluation system needs to be 
one which provides a solid basis for pro- 
ceeding .//r.v/ to put pressure on an unsatis- 
factory instructor to upgrade performance, 
and second io remove them either from the 
faculty or from the instructional faculty if 
the first result is not possible. We are now 
moving away from the legal model of the 
'^master-slave" relationship in colleges and 
towards the legal model of "just cause" for 
dismissal, as a result of the affirmative ac- 
tion legislation and the gradual develop- 
ment of an orientation toward collective 
bargaining. So previous fears about viola- 
tions of academic freedom, which were the 
most important basis for tenure, are some- 
what less of a worr>' and can certainly be 



taken cure of via protective contracts, 
whether formal or informal. The residual 
infamy of tenure, due to the number of peo- 
ple who are "burn^^fi out" but kept on by it, 
stands as a proc'rimaticvn of lack of respon- 
sibility that is becoming increasingly prom- 
inent as the hard times for higher education 
develop. We can't afford to continue that 
way, and we can't move any other way 
without a rock solid process for evaluating 
teaching — and for improvwfi h. which must 
be the court of first resort. 

VM. Integrating and Disseminating the 

Results of Faculty Evaluations 

While the results of formative feedback 
go to the instructor alone, or to some con- 
sultant that is chosen by the instructor, the 
summa*ive evaluations from students go to 
the responsible first level administrator. 
They must be combined with the informa- 
tion in the files, direct measurement of 
learning gains (if available) and appropriate 
comparisons, ratings by teaching assistants, 
the quality ratings made by topic experts of 
the content and by process experts of the 
examining and certain other professional 
processes. In addition, considerations of 
wortl- must be brought in for the approp- 
riate decisions. The integrative process is a 
very tricky one and all the preceding work 
may be wasted if bias can slip in here as it 
usually can and does. Without getting into 
the minutiae of the topic, it is not easy to 
give a complete outline of what should 
happen at this point. But the ideal toward 
which one should strive is clear e .)Ugh; the 
integrative process shouldT)e ^/.S7>/V>/^' Hfi^or 
w eighting and sunwung procedure. In this 
respect it should be the same as the proce- 
dure for combining the results of the teach- 
ing evaluation with the results of research 
and service evaluations. Extreme rigidity in 
the process is far better (although not neces- 
sary) than allowing the department chair or 
XY^ dean to do a holistic "seat of the pants" 
:>ynihesis of the data at this point, a synthe- 
sis which is a principal way for bias to come 
in. 

There are of course a variety of situations 
in which 6ne does not want to use equal 
weighting for all these components, but it 
should be a matter of extreme openness 
how the^tictual weights arc determined and 
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what they are; and where it is possible to 
arrange lor individual variations in the 
"contract/' 

What we have sketched above is a rather 
incomplete version of what is only ix fairly 
complicated procedure. It's a perfectly 
comprehensible procedure and can readily 
be made a rather equitable one. One of the 
results to be avoided at all costs is the 
pressure to go with different evaluative 
approaches by different departments or 
schools. It may be a useful political pro- 
cedure to start off with that as a possibil- 
ity in order to get people on board 
who would rather die than admit that their 
teaching process in a lab or a clinic has 
anything in common with the teaching pro- 
cess in a philosophy seminar nr a basic sta- 
tistics course, but the real truth of the mat- 
ter is that any such view is complete 
nonsense. There is nothing about any of the 
procedures recommended here that will not 
work perfectly well for any course offered 
on a university campus, from the school of 
music to molecular biology. One may want 
to add an extra question or two here, drop 
one or two there, especially from the forma- 
tive evaluation procedures, anu n.-rhaps 
even from the "Black Marks List." uat these 
are very minor changes and the simplest 
way is to put all of the questions on all of 
the forms that arc generally distributed, 
with a "not applicable" option for the stu- 
dents. The backup use of other forms, with 
reference to Style, where necessary or de- 
sired, may certainly be made as idiosyn- 
cratic as the instructor pleases, and the de- 
partments may have a preferred version of 
these. But those forms should not get into 
the personnel review process even at the de- 
partmental level, without both legal and 
ethical expert opinion. 

Vll. The Key Ethical/Legal Constraint on 
DatarGathering 

The underlying reason for taking such a 
firm line about this point is not merely the 
absence of any scientific evidence that con- 
nects particular approaches to teachmg with 
su ccessful results; it is a miich graver mat- 
ter. The error in racism, or sexism, does not 
lie in the empirical falsehood of the claim 
e.g. that the crime rate amongst blacks is 
higher than it is amongst whites, or that 



management experience is much less likel>L 
to be present in a woman than it is in a man; 
both these and many other similar generali- 
zations are true. But their truth fails to jus- 
tify the appeal to them in order to make 
a personnel decision adverse to a black or 
woman candidate. And their truth does not 
justify such an appeal, not because it 
happens to be illegal, but because it happens 
to be scientifically indefensible. Generaliza- 
tions like this are very much less reliable as 
predictors than inferences made from track 
records of the individual candidate. It is ad- 
ditionally and ethically unquestionable that 
any use of such generalizations would lead 
to self-fulfilling and socially undesirable re- 
sults, so on that additional ground they 
must be abhorred. But one does not need 
the additional ground in order to see that, 
no. matter what the results of research on 
teaching styles ever turns out to be. one will 
never he able to use those results for making 
decisions about individual instructors be- 
cause to do so would he a case of gui/t by 
association. (One may well ask what the 
point of research on teac^"ing styles is, if it 
can never be used in this way. The best 
answer is that it can increase our repertoire 
of promising options,) 

A final point on the ethical side. I do not 
think that it is ethical, and 1 suspect it may 
not even be legal, to deny the students who 
generate the key evaluation judgments in 
this whole process the opportunity lo see the 
summarized results. One might certainly 
argue against it with respect to some detailed 
written-in remarks, because those cannot 
properly be used without balancing them 
against all the rest, and doing that in a sys- 
tematic way is something that students may 
not be in a position to, do; but the holistic 
responses and their distribution is some- 
thing one cannot properly withhold from 
the students. In addition to questions of 
propriety, there is the further thoroughly 
unattractive possibility that the students, 
frustrated in their attempt to get these, will 
set up their own evaluation system arid one 
will then either have to cooperate with it; 
which increases the chance of questionnziire 
overload with a reduction in response rate 
and validity, as well as a loss of class time; or 
take the extremely unattractive position that 
stude^hts are not allowed to poll their peers in 
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order to evaluate inslniclors. 

Some compromises will be necessary be- 
cause the kind of evaluation of instructors 
that students are interested in doing is as 
different from the kind that we have been 
talking about as formative is from summa- 
tive. Students are interested in such ques- 
tions as the assignment-mode, the cost ol 
the textbooks and whether they are really 
necessary, and certain teaching style vari- 
ables which can well be an appropriate basis 
for the student with a certain learning style 
to use in selecting in or out of a particular 
class. But compromises of this kind should 
be made and can be made in the interest of a 
campus-community approach. Student mo- 
rale is not well served by leaving them out of 
this process, and in the long run faculty eval- 
uation and faculty improvement suffers 
— from not having input from these student 
points of view. 

VII. Conclusion 

Personnel evaluation in general is a diffi- 
cult field and one that is usually done with 
shocking incompetence 2(s one can readily 
discover by studying the forms used by the 
"White House, or by large corporations or 
the Armed Services. If it is to be improved, 
presumabiythe academies should 
be the source of the leadership. It is they af- 
ter all, who do most of the relevant research 
and interpretation. In the key matter of the 
evaluation of teaching at the ccilege level 
they have shoWn a disgraceful disinclination 
to get into it. and in recent years when that 
distaste has partly evaporated, a disgraceful 
lack of capacity to do it in a defensible way. 
Perhaps the above remarks will prove suffi- 
ciently irritating to leail to improvement, 
even if they ire not themselves accepted as 
suggesting appropriate improvement. 
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